Typically, when working with regular time-series data, we assume to have some degree of correlation between the series and some of its past lags. Weather data, such as temperature measurement over time, is a good example of such type of relationship. In most of the cases, you should expect to have a high correlation between the temperature and both the first lags (e.g., the preceding hour) and the seasonal lags (e.g., the corresponding time during the previous day). This correlation, when exists, has predictive power that can utilize to forecast the future observations of the series. The most common example is the ARIMA family of models (e.g., AR, MA, ARMA, ARIMA, etc.), which quantify the relationship between the past the lags of both the series and the error term with the series future predictive values. Therefore, prior to building a forecasting model, you may want to conduct a correlation analysis. The TSstudio package provides a set of functions for correlation analysis of regular time-series data.

The ACF and PACF functions

The most common tools for correlation analysis of time-series data are the ACF (autocorrelation function) and PACF (partial autocorrelation function) functions. While both functions calculate the correlation between the series and its lags, the PACF reduces the effect of the correlation of lag n with the preceding lags when calculating the correlation between the series and lag n itself.

In R, the base method to calculate and plot the ACF and PACF values of a series is with the acf and pacf functions from the stats package, respectively. The ts_cor function, from the TSstudio package, provides an interactive, colorful, and flexible wrapper for the acf and pacf functions. Before jumping to the ts_cor function, let’s review the acf and pacf functions using the daily demand for electricity in the UK, which available on the UKgrid package. We will use the extract_grid function from the UKgrid package to pull the series in daily frequency and ts format since 2017:

library(TSstudio)
library(UKgrid)

ts.obj <- extract_grid(type = "ts", 
                       columns = "ND", 
                       start = 2017,
                       aggregate = "daily")

We will use the ts_info and ts_plot functions to view the main characteristics of the series and plot it, respectively:

ts_info(ts.obj = ts.obj)
##  The ts.obj series is a ts object with 1 variable and 1011 observations
##  Frequency: 365 
##  Start time: 2017 1 
##  End time: 2019 281
ts_plot(ts.obj)

Let’s now use the acf and pacf functions to plot the ACF and PACF values of the series (will use the par function to plot the two together):

par(mfrow=c(2,1))
acf(ts.obj)
pacf(ts.obj)

ts_cor(ts.obj = ts.obj, lag.max = 365 * 2)
ts_cor(ts.obj = ts.obj, seasonal_lags = 7, lag.max = 365 * 2)
data(USgas)

acf(USgas, lag.max = 12 * 5)

pacf(USgas, lag.max = 12 * 5)